Compiler-Assisted Sub-Block Reuse

نویسندگان

  • Jian Huang
  • David J. Lilja
چکیده

The fact that instructions in programs often produce repetitive results has motivated researchers to explore various alternatives to exploit this value locality, such as value prediction and value reuse. Value prediction improves the available Instruction-Level Parallelism (ILP) by allowing dependent instructions to be executed speculatively after predicting the values of their operands. Value reuse, on the other hand, tries to remove redundant computations by buffering the previously produced results of instructions and skipping the execution of instructions with repeating inputs. Previous value reuse mechanisms use a single instruction or a naturally formed instruction group, such as a basic block, a trace, or a function, as the reuse unit. These naturally formed instruction groups are readily identifiable by the hardware at run-time without compiler assistance. However, the performance potential of a value reuse mechanism depends on its reuse detection time, the number of reuse opportunities, and the amount of work saved by skipping each reuse unit. Since larger instruction groups typically have fewer reuse opportunities than smaller groups, but also provide greater benefit for each reuse-detection process, it is very important to find the balance point that provides the largest overall performance gain. In this paper, we propose a new mechanism called sub-block reuse to intelligently group instructions into reuse units using compiler-assistance. The goal is to balance the reuse granularity and the number of reuse opportunities. The sub-blocks are produced by slicing the basic blocks at compile-time using appropriate dataflow considerations. Our simulations with the SPECint95 benchmarks show that sub-block reuse with compiler assistance has a substantial and consistent potential to improve the performance of superscalar processors with speedups ranging from 10-22%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Balancing Reuse Opportunities and Performance Gains with Subblock Value Reuse

The fact that instructions in programs often produce repetitive results has motivated researchers to explore various techniques, such as value prediction and value reuse, to exploit this behavior. Value prediction improves the available Instruction-Level Parallelism (ILP) in superscalar processors by allowing dependent instructions to be executed speculatively after predicting the values of the...

متن کامل

Improving Processor Performance Through Compiler-Assisted Block Reuse

Superscalar microprocessors currently power the majority of computing machines. These processors are capable of executing multiple independent instructions in each clock cycle by exploiting the Instruction-Level Parallelism (ILP) available in programs. Theoretically, there is a considerable amount of ILP available in most programs. However, the actual amount of exploitable ILP within a fixed in...

متن کامل

Extending Value Reuse to Basic Blocks with Compiler Support

Speculative execution and instruction reuse are two important strategies that have been investigated for improving processor performance. Value prediction at the instruction level has been introduced to allow even more aggressive speculation and reuse than previous techniques. This study suggests that using compiler support to extend value reuse to a coarser granularity than a single instructio...

متن کامل

Exploiting Basic Block Value Locality with Block Reuse

Value prediction at the instruction level has been introduced to allow more aggressive speculation and reuse than previous techniques. We investigate the input and output values of basic blocks and find that these values can be quite regular and predictable, suggesting that using compiler support to extend value prediction and reuse to a coarser granularity may have substantial performance bene...

متن کامل

Compiler-assisted Hybrid Operand Communication

Communication of operands among in-flight instructions can be power intensive, especially in superscalar processors where all result tags are broadcast to a small number of consumers through a multi-entry CAM. Token-based point-to-point communication of operands in dataflow architectures is highly efficient when each produced token has only one consumer, but inefficient when there are many cons...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000